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JOINT ESTIMATION AND MODEL ORDER SELECTION FOR ONE 
DIMENSIONAL ARMA MODELS VIA CONVEX OPTIMIZATION: A NUCLEAR 

NORM PENALIZATION APPROACH 

STEPHANE CHRETIEN, TIANWEN WEI AND BASAD ALI HUSSAIN AL-SARRAY 


Abstract. The problem of estimating ARMA models is computationally interesting due to the non¬ 
concavity of the log-likelihood function. Recent results were based on the convex minimization. Joint 
model selection using penalization by a convex norm, e.g. the nuclear norm of a certain matrix related 
to the state space formulation was extensively studied from a computational viewpoint. The goal of 
the present short note is to present a theoretical study of a nuclear norm penalization based variant 
of the method of 00 under the assumption of a Gaussian noise process. 

Keywords: ARMA models, Time series, Low rank model, Prediction, Nuclear norm penalization. 

1. Introduction 

The Auto-regressive with moving average (ARMA) model is central to the field of time serie analysis 
and has been studied since the early thirties in the field of econometrics [12]. ARMA time series are 
sequences of the form (xt)teN satisfying the following recursion 

p <7 

(1.1) x t = X a i x t-i + X b j e t-j + et 

i= 1 j= 1 

for all t ^ max {p, q}, and we focus on the case where (et)tgN is a sequence of zero mean independent 
identically distributed Gaussian random variables with variance denoted by for simplicity Q. As 
is well known [12] , time series model are adequate for a wide range of phenomena in economics, 
engineering, social science, epidemiology, ecology, signal processing, etc. They can also be helpful as 
a building block in more complicated models such as GARCH models, which are particularly useful 
in financial time series analysis. 

Two problems are to be addressed when studying ARMA time series: 

(1) estimate p and q, the intrinsic orders of the model. 

(2) estimate a= (ai, a 2 , ..., a p ) and b= (b\, & 2 5 ■■■b q ). 

In the case where q = 0, the convention is to write m as: 

p 

(1.2) x t = X ajXt-i + et 

2—1 

and Xt to simply called an AR process. Estimation of a is often performed using the conditional likeli¬ 
hood approach, given xq, ..., yielding to the standard Yule-Walker equations. On the other hand, 
the model order selection problem is often performed using a penalized log-likelihood approach such 
as AIC,BIC,.., may also use the plain likelihood. We refer the reader to the standard text of Brockwell 
and Davis for more details on these standard problems. Turning back to the full ARMA model, it is 
well known that the log-likelihood is not a concave function, and that multiple stationary points exist 
which can lead to severe bias when using local optimization routines for such as gradient or Newton- 
type methods for the joint estimation of a and b. In Shumway and Stoffcr [12] and iterative procedure 
resembling the EM algorithm is proposed, which seems more appropriate for the ARMA model than 
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standard optimization algorithms. However, no convergence guarantee towards a global maximizer is 
provided. Concerning the model selection problem, penalties play a prominent role in modern sta¬ 
tistical theory and practice, in particular since the recent successes of the LASSO in regression and 
its multiple generalization. The nuclear norm penalization has played an import for many problems 
in engineering, machine learning and statistics such as matrix completion, ... Application of nuclear 
norm penalization to state space model estimation and model order selection using a moment-like 
estimator in a convex optimization framework is proposed in [6|. The approach of [6] is a remarkable 
contribution since convex model selection and state space estimation were combined for the first time 
in the problem of Time Series. However the approach of [6] is supported by no theoretical guarantee 
yet. Another approach for State Space model estimation was proposed in mm where good practi¬ 
cal performances are reported and an asymptotic analysis is provided. This method as well as the 
unpenalized version of the method in [bj can be recast into the family of subspace methods; see 115] . 
In such subspace-type methods, model order selection and model estimation are decoupled and it is 
natural to wonder if the approach of [2j can be refined in order to incorporate joint model selection 
using a nuclear norm penalty as in [6]. 

Based on the evidence of the practical efficiency of subspace-type methods m, our goal in the 
present note is to propose a theoretical study of a nuclear norm penalized version of the subspace 
method from [23 which incorporates the main ideas in J6]. 

2. The subspace method 

2.1. Recall on the subspace approach. A real valued random discrete dynamical system (xt)tew 
admits a State Space representation if there exists a discrete time process (si)tgN such that 

st+i = As t + Ke t 
xt = Bs t + e t 

where (et)teN is the noise, and A G M pxp , B G M lxp , K G M pxl are parameter matrices. It is well 
known that ARM A processes admit a State Space representation and vice versa (12) . 

2.2. Prediction. The problem of predicting Xt+j for j ^ 0 based on the knowledge of Xf, t! < t and 
St can be solved easily following the approach by Bauer mm~ For given initial values xq, eo, the State 
Space representation gives 

h 

E BAi- x Ke t+h -j + BA h s t 

3 = 1 

On the other hand, the State Space representation implies that 

s t = As t -i + Ke t -1 

= As t -i + K (x t -i - Bs t - 1 ) 

= (A - KB) s t -1 + Kxt-i 


Thus, we obtain 


t -1 

= (A- KBf s 0 + J2(A-KB) j Kxt-^j. 

j=o 

In what follows, we will assume that we observe xq, ... ,xt and that t > 0 is such that T — 2t +1 > 0. 


2.3. Prediction with Hankel matrices. We will rewrite the prediction problem in terms of some 
Hankel matrices. For this purpose, define 

A = A — KB, Ao = [A^ 0 , A* +1 , ..., A T -* +1 s 0 ], K, = [A^K, • • • , A 2 /\, AK. K] , 
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B 


1 

0 ••• • 

. 0 ' 


BA 


BK 

1 0 • 

. 0 

0 = 

BA 1 - 1 

and M = 

BA t ~ 2 K 

BA f - 3 K ■■■ • 

• • BK 1 


Then, we have 

(2.3) 
and 

(2.4) 


Xt 


e t 


— Ost + M 


_ X 2 t-l _ 


£t+h 


s t = K 


x 0 


Xt-l 


+ (A- KBf so- 


Combining (12.3p and (12.41) . we thus obtain 


Xt 


x 0 




= OK 


+ 0(A-KB) t sq+M 


_ X2t-1 _ 


. Xt ~ l . 


et+h 


Now, define 



Xo 

X\ ■ 

• XT-2t+l 


Xt 

Xt+1 ■ 

■ XT-t+1 

Xpast — 

X\ 

X2 ■ 

■ XT-2t.+2 

and Xf u ture — 

Xt +1 

Xt+2 ■ 

■ XT-t+2 


_ Xt-l 

x t • 

XT-t 


_ X2t-1 

X2t ■ 

X T 


Both matrices are Hankel matrices. The first one represents the past values and and second one the 
future values. Define also the noise matrix 


E = 


All these Hankel matrices are related by the following equation 

Xfuture = Xp as t + OAq + ME. 


et 

et+ 1 • • 

■ e-r-t+i 

et+i 

et +2 ■ ■ 

• e-r-t +2 

e2t-i 

e2t 

ex 


3. The estimation problem 

In the last section, we showed that the matrices A, B and K of the State Space model entered nicely 
into an equation allowing prediction of future values based on past values of the dynamical system. 
Our goal is now to use this equation to estimate the matrices A, B and C. One interesting feature 
of our procedure is that the dimension p of the State Space model can be estimated jointly with the 
matrices themselves. 


3.1. Estimating OK. The matrix OK can be estimated using a least squares approach corresponding 
to solving 

(3.5) min 77 11 Xj u f /ure L Xp as t\\p. 

Le R txt 2 J r 


This procedure will make sense if the term OAq is small. This can indeed be justified if t is large and 
if || A|| is small. Let us call L a solution of (13.51) . 
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3.2. Nuclear Norm penalized 4-norm for low rank estimation. An interesting property of the 
matrix OK is that its rank is the State’s dimension p when A has full rank. Moreover, OK has small 
rank compared to t when t is large compared to p. Therefore, one is tempted to penalize the least 
squares problem (13.51) with a low-rank promoting penalty. 

One option is to try to solve 

(3.6) min h\X future - L X past \\ 2 F + Xrank(L) 

LEM. txt ^ 

The main drawback of this approach is that the rank function is non continuous and non convex. This 
renders the optimization problem intractable in practice. Fortunately, the rank function admits a well 
known convex surrogate, which is the nuclear norm, i.e. the sum of the singular values, denoted by 


Thus, a nice convex relaxation of (13.61) is given by 

1, 


(3.7) 


mm 


LeR txt 2 


X 


future 


- L X 


past || F 


+ A IILI 


It has been observed in practice that nuclear norm penalized least squares provide low rank solution 
for many interesting estimation problems HU 


4. Main results 

The penalized least-squares problem (13.71) can be transformed into the following constrained problem 


(4.8) 


min IIXI 

Lm txt 


subject to ||Xy n j nre L Xp (!S f 11 f < Tj , 


for some appropriate choice of tj. 

Let £ denote the covariance matrix of [xo, • • ■ ,%t- i]* and let £ :t 2 denote the square root of £ ±1 . 
Then, Let H be the random matrix whose components are given by 

T-2t+l 

Hs,r ^ ^ ^ s,s f %s'-\-r • 

s'=0 

where £ SjS ', s = 0,..., t — 1 and s' = 0,..., T — 2t + 1 are independent Rademacher random variables 
which are independent of z s >, s' = 0,... , T — t. Let T, H denote the covariance matrix of vec (H). Let 
A4 denote the operator defined by 


(4.9) 


M = Mat (y, h 1/2 vec(-)) 


and let A4 - * denote the adjoint of the inverse of A4. The fact that A4 is invertible is easily obtained 
(see Section [6X3]) and is seen from the fact that T, H has all its eigenvalues equal to T — 2t+l according 
to Section 16.3.21 Let S be the operator defined by 


5(-) ha M~* (•) £ -1/2 . 


and let T be the mapping 


T(0-> 


M 


-l 


• £2 


y/1 (T — 2t + 2) 

Our main result is the following theorem. 

Theorem 4.1. Let £ be any positive real number. Assume that tj is such that 
(4.10) \\OAs 0 +UE\\ > tj 

with probability less than or equal to e ~ u2 / 2 for some v > 0. Then, with probability greater than or 
equal to 1 — e~ u "^ 2 , 
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where 


A > £y/t(T — 2 i + 1 ) ( 1 


4£ fe\\ 


(Tu 


(x 1 ^) Vt' 


- 2 V 2 


<Tmax(X) 


T — 2t + 1 V er m i n (X) 


/-\ r rank(0/C) 

2 ci + l)+cvt) vt- 7 == + 2 t — vf. 


C -\/<Tmin(X) 


In the remainder of this section, we introduce the results, notations and tools for proving this 
theorem. The proof is given in Section 14.61 

4.1. Some notations. For all s = 0,..., t — 1 and s' = 0,..., T — 2t +1, let ^4 S)S / denote the operator 
defined by 

t -1 


“As,s'(A) — ^ ^ Ls, r X s ’+r 

r =0 


(4.12) 

and let A denote the operator 

(^•-*-3) Aha (^ s ,s'(A)) s=1 ) t)S / =0) T _ 2t+1 - 

The descent cone of the nuclear norm at 0/C, denoted by T >(|| • ||*,0/C), is defined by 

(4.14) V(\\ • ||*,0/C) = U T>0 {T> € | ||AT + 7\D||* < ||iL||*} . 

4.2. A deterministic inequality. The following result will be the key of our analysis. 
Theorem 4.2. Em Assume that 

(4.15) \\OAs 0 +ME\\ < rj. 

Let L denote any solution of ft4.8\) . Then, 

2?7 


(4.16) 

where 

(4.17) 


l|0/C L|lF “ A min (A,P(HU,0/C))’ 

A mi n(A,P(||-||*,0/C)) = min ||A(0)|| F . 

11011^=!. 

DdD(\\-\L,OK.) 


4.3. A lower bound on A m ; n (A, T>(\\ ■ ||*,0/C)). We will closely follow the approach of Tropp based 
on Mendelson’s bound. For this purpose, we will need the definition of the Gaussian mean width 
wq(X) of a set X S 


wg{%) = IE 


sup (G, x ) 

.x£X 


where the expectation is taken with respect to the Gaussian random vector G taking values in 
The statistical dimension of X (see e.g. ED) is Let us also denote by Q^ the quantity 


t -1 


Qd D ) = 7 X P 


s=0 


t-1 




r =0 




which, as one might easily check, does not depend on s'. Recall that X is the covariance matrix of 
[xo, • • • jXt-i}' and that X^z denotes the square root of X^ 1 . Thus, 


~0 

1 

x 0 

_ Z t -1 _ 

:= X“z 



follows the standard Gaussian distribution A7(0, 1). Let D = DXl We now state Tropp’s result. 
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Lemma 4.3. Define 


K = 


\J1 (T — 2t + 


^M- 1 (s-^Pdl-H *,0/C)) 


We have 

Xmin(A,V(\\ • \\*,OK)) > Wt(T - 2t - 2) . inf Q^(D) - 

i?=l 

with probability greater than or equal to 1 — exp(—u 2 /2). 

Proof. See Section [64] 

4.4. A lower bound on inf||£>£i/ 2|| F=1 Q 2 £ } {D). Since 


w g {K) - vf 


t -1 

Z — ^ ^ D s ,r^s'+r 
r =0 


/ -U 

follows the law jV(0, X]r=o r)> us i n g Lemma [6.21 from the Appendix 

/ /t-i \ \ 1 


we get 


Thus, setting 




\r =0 


□ 


we obtain 


This finally gives 


u = 


£r=0 


2 ’ 


P 


t-1 

r =0 






Etc 4 2 ,r 


v s=0 


1 t-1 


-=0 \]Ylr~=\Dlr 

Let us now compute a lower bound to the inhmum of this quantity over the set of D satisfying 
\\DE l / 2 \\ F = 1. For this purpose, hrst note that 

_ inf Q 2 z(D)>1- sup “F=(^) 4 7 


7r V 2 / t ^^ /x^t— 1 7a2 

s= 0 y ^ s,r 


11 "■* — —\ / y ^—*/ —u 

On the other hand, simple manipulations of the optimality conditions using symmetry prove that 

. t -1 


Therefore, 

(4.18) 


1 1 

sup - V = = V7, 

■ ||rSi 1 -» veSau 

inf Q 25 (D) >1-4 (f) 1 (E 1 ' 2 ) Vt 


110 ^ 1 / 211 ^ = 1 


4.5. The Gaussian mean width of K. The Gaussian mean width of a set X and its statistical 
dimension are related by 

(4.19) w G {X) 2 < 6(X) < w G (X) 2 + 1. 

See [TJ Proposition 10.2] for a proof. In this subsection, we estimate the Gaussian mean width of K 
using its statistical dimension. 
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4.5.1. The descent cone T>(\\ ■ ||*,0/C). The descent cone of the nuclear norm satisfies [T4, Eq. (4.1)] 
which we recall now 

(4.20) P(|| • \\*,OK)° = cone (P|| • ||*(0/C)). 

4.5.2. Computation of K°. Using Proposition 4.2 in m, we obtain 

(4.21) sup ( D*,H\ < dist (h,K°) . 

D*£K 

We now have to compute the polar cone of K. We have 
K° = {A | (A, D) < 0 VP € K} 

A | ( , 1 M~ (a^) ,D) < 0 VP eV(\\ ■ \U,OJC) 

\jt(T-2t + 2) V / ~ ' 

Recall that T is the mapping 

A i-a ; 1 M~ l (a 

y/t (T-2t + 2) V 

Then, we obtain that 

K° = T' 1 (£>(|| • ||*, OK )°). 

4.5.3. An upper bound on the statistical dimension of K. Let us write the singular value decomposition 
of OK as 




OK = [ lh U 2 ] 


diag(o-o/c) 0 
0 0 


[ n v 2 ]* 


where a ok, is the vector of the singular values of OK. Moreover, the subdifferential of the Schatten 
norm is given by 


d\\ • UOK) = [lh U 2 ] { $ y I 11*1 < l} [ Vi V 2 ] 


Therefore, using (14.21[) . we obtain that 

2~\ 


E 


/ 


sup 

l|n*|| F =i, 

D*£K 


D*,H 


< E 


min ||T 1 ( t 

r>0,||y||<l 


U\V\ 0 

0 u 2 yv% 


-ml 


Thus, we get 

( 


E 


sup 


\\ u 
\ D 


D*,H 


d*gk 


< E 


mm 


T>0, || V ||<1 


T- l \\ mihVfWl + || rU 2 YVf - T 2)2 (H)\\l 


+ \\Ti,2(H)\\ 2 F + \\T2,im\ 2 F 


Tn Ti 2 

T 2 i T 22 


where 
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and the dimension of T\ \ is rank(0/C) X rank(0/C) and the dimension of Tjj' for all other combinations 
of j and j' is easily deduced from the dimension of T. which gives, after taking t = || U 2 T 2 , 2(-ff)^2|l> 

2- 


E 


sup 

|d*IIf=i 

d*gk 


D*. H 


Note that 


< o-min(T) -1 |e [r 2 ] rank (OK) 

+ ^max IT\ ,2) 2 + Cr max (72,l) 2 ^ E 

r< ifellimi- 


\H\ 


By Gordon’s theorem [T6l Theorem 10.2], E ||iL|| < 2 \ft. Moreover, by Lemma [6.II in the Appendix, 


E 


\H\ 


< - (2 ct + 1) + 2 Vt. 


On the other hand, E 




6(K) = E 


= 2 1. Therefore, we obtain that 

2-1 


sup (D*,H 

l|o*|| F =i, 

d*gk 


< 2 <r m in(T) 1 ^ |fel| 2 ((2 ct + 1) + cVtj rang (OJC) 


Using 114. 191) . we obtain that 
(4.22) 


+ 2 Cr max (7l,2) 2 + <Tmax(72,l) 2 t 


W G (K) < 


\ 


2 OYniTi(T) 1 \ ||'7^, 2 || 2 ((2 ct + 1) + c Vt) rang (OK) + 2 cr max (Ti, 2 ) 2 + cr max (7i,i) 2 t 


4.6. Proof of Theorem 14.41 Combining Lemma PTTTTI with (14.181) and (14.221) . we obtain that 


4£ 2 ( e s 1 


Amin(A^(|| ■ \\*,OJC)) > t VT - 2t - 2 -^-J cr min 


2^2 


(s 1 / 2 ) 


<Tmin(‘5) 
Using that 

and 




2 + 0-max(7^l) 2 ) t - u£ 


feii 2 < mi 2 , 


^max (ri,2) 2 +amax(r 2 ,l) 2 <2||T|| 2 , 

and combining this last inequality with Theorem 14.21 we obtain the following proposition. 
Proposition 4.4. Let £ be any positive real number. Assume that rj is such that 
(4.23) \\OAs 0 +UE\\ > r? 
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with probability less than or equal to e 1,2 ' 12 for some v > 0. Then, with probability greater than or 
equal to 1 — e~ v / 2 , 

2r; 


(4.24) 

where 


\\OJC-L\\ f < 


4£ ( e 


A > £y/t(T-2t + l) ( 1 -(- 


2V2\\r\\ 


A ’ 


O n 


(s 1 / 2 ) Vt 


°'min(‘5) 




rang(0/C) 
C O' min (T) 


+ 2 i - vf. 


Combinig this result with the bounds from Section 16.3.31 the proof is completed. 

5. Conclusion 

The goal of the present note is to show that the performance of nuclear norm penalized subspace- 
type methods can be studied theoretically. We concentrated on a special approach due to Bauer [2j. 
Our approach can easily be extended to the case of the method promoted in [6]. Our next objective 
for future research is to address the case of more general noise sequences such as in [9]. 

6. Appendix: Technical intermediate results 
In this section, we gather some technical results used in the proof of Theorem 14.41 

6.1. Proof of Lemma 14.31 

6.1.1. First step. We have 

(6.25) A min (*4, V(\\ ■ ||*, 0/C)) = 


mm ||A(T>)|f 
ll-DH p=l, 

DeT>(\\-\\*,01C) 


(6.26) 


mm 


D€V(\\-\\,,OK.) 


t—l T— 21+1 /t- 1 ' 

E E E ■D g , V ^ s 1 -\- V 

. s=0 s'=0 \r=0 / 


l 

2 \ 2 


Recall that E is the covariance matrix of [xo, • • •, Xt-iY and that E^a denotes the square root of E^ 1 . 
Thus, 


Zo 

1 

x 0 

_ Zt-1 _ 

:= E a 

. Xt ~ l . 


follows the standard Gaussian distribution A^(0, 1). Recall also that D = HE a. Then, we have 
(6.27) A min (A^(IHI*,CW)) = 


mm 

||D£-l/2|| F= l, 

DGT)(\\-\\ t ,OK.)'E 1 / 2 


t-1 T—2t+l /t-1 

'y 1 'y 1 ( 'y ' D Sjr z s i+ r 


i 

2 \ 2 


. s=0 s'=0 \r=0 


Now, we have 

1 

t (T — 2t + 1) 


t-l T—2t+l /t-l 

'y i 'y i ( 'y i D Str z s i+ r 


i 

2 \ 2 


> 


1 


s=0 s'=0 \r=0 

which gives, by Markov’s inequality 


t (T — 2t + 1) 


t-l T—2t+l 

E E 

s=0 s'=0 


t-l 

~y ^ D sr z s i- i_ r 


r=0 


1 


t (T - 2t + 1) 


t-l T-2t+l /t-l 

E E E D s ,r^s’+r 

s=0 s'=0 \r=0 / 


l 

2 \ 2 


> 


t (T - 2f + 1) 


t-l T—2t+l ( 

EE 1 

s=0 s'= 0 l 


t-l 

Y ^ T) sr z s '- i_ r 

r =0 
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Thus, we obtain 


1 


t(T — 2t + l) 

> ZQ 2 e(D) - 

6.1.2. Second step. Let 

f(z 0 ,...,zr-t ) = 


t-1 T-2t+l ft- 1 

y ] y ] ( y ] Ds,rz S '+r 


1 

2 \ 2 


s=0 s'=0 \r=0 

£ 


t-i T-2t+i / / 

t (T — 2t +"l) 

' s=0 s'=0 V 1 


t -1 

^ ^ D s ,rZs'+r 


r =0 




t—1 T-2t+l / / 

sup X] ( ^(-D) - 1 < 

||DE- 1 /2|| f = i j s= 0 s , =q \ [ 

DeDflHI^O/QE 1 / 2 


t-1 

r=0 


Sjr-^s'+r 




We will now use the bounded difference inequality to control this quantity. For this purpose, notice 
that 

l/(Co, Cr-t) - /(Co, ■ ■■,&•■■> Cr-t)| < 2 t (T — 2t + 2). 

for all (Co) • • • j Csj • • •) Ct— t) in M r_t+1 and € M. Thus, 

f(zo,...,z T -t) ,z T -t)] < v ^/t(T -2t + 2), 

with probability 1 — e _zy2 / 2 for all z/ E R+. Now, the expected supremum can be bounded in the same 
manner as in m Equation 5.6]. 


E [f(z 0 ,... ,z T -t)] < | IE 


t- 1 T—2t+l t-1 

SUP £s ’ s ' ^2 Ds,rZ s '+r 

||D 2 - 1 / 2 || F =l, s =0 s '=0 r =0 

.ileX'(||-||,,0/C)E 1 /2 


where s s r , s = 0,..., t — 1 and r = 0,..., t — 1 are independent Rademacher random variables which 
are independent of z s >, s' = 0,... , T — t. Therefore, we obtain 


inf ,_L 

IIde- 1 / 2 || f= i, y t (T — 2t + 1) 

Dex>(||-||*,OA:)E 1 / 2 


t-1 T-2t+l /t-1 

E E E Ds,rZs'-\-r 

s=0 s'=0 \r=0 / 


l 

2 \ 2 


> £Q2t(D) - 




E 


t(T 2t + l) _dge-V 2 i>(||-||»,o/c) Go 


t-1 T—2f+l t-1 

y ^ y ^ £s,s' y " D Str z s i+ r 


0 r=0 


+u yf (T — 2t + 2) ], 


which gives 


||DE- 1 /2|| F= i 


t-1 T—2t+l /t-1 

y i ^ i ( y i f^s, r z S '-\-r 

s=0 s'=0 \r =0 / 


l 

2 \ 2 


inf 

-1/21 

Dez>(||-||.,o/c)E 1 / 2 

> £ -\/i — "b 2) 


— 2 E 


1 


t-1 T—2t+l t-1 

SUP U (rp _ 9 . =7W ^2 ^2 £ s,s'^2D s , r Z s ' +r 

||JDE-V2|| f =i, V t K 1 Zt ~T~ Z ) s =0 s '=0 r=0 

sgx>(IHI„o/c)e 1 /2 
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Let us denote by W the quantity 


W = E 


sup 


I|ue- 1 /2|| f= i i yjt (T 2 1 + 2) s=Q s , =i 


t-l T—2t+l t -1 

xi x £s ’ s ' 


r^s'+r 


.Dex>(|H|»,OK:)E 1 /2 


s'=0 r =0 


Then, we have 


W = E 


1 


sup 


(D,H) 


I|£>e- 1 / 2 h f =i, \A (T + 2) 

.DGl’(||-||*,C)/C)E 1 /2 

where we recall that H is the random matrix whose components are given by 

T—2t+i 

H S) r — ^ ^ &s,s'^s'+r 

s'=0 

and Y* h denotes the covariance matrix of vec (H). Let H = M(H) where M denotes the operator 
defined by 


M.{-) = Mat / vec(-)^ . 


Then H is a Gaussian matrix with i.i.d. components with law A/"(0,1). Using the invertibility of M 
proved in Section 16.3.31 we get 


where 


W = E 


K = 


SU P||A4-*(o* E- 1 /2)|| f =1, 
d*gk 




y/t (T-2t + 2) 


D*,H 


„OK) S2 


where we recall that M * is the adjoint of the inverse of M. Moreover, we have 


sup ( D\H) < 

||A1-*(D*) E- 1 /2|| f = i i 
D*€K 


1 

-TFT SU P 

Tmin [p ) ||d*|| f =i, 

D*&K 


D*, H 


where cr m i n (5) is the smallest singular value of the operator S defined by 

S(-) ha M~*(-)^~ 1/2 . 


Thus, 


and the proof is completed. 


W < 


wg(K) 

°’min(<5) 


6.2. Control of E 


\H\ 


Lemma 6.1. We have 


E 


\H\\ 


£|1 + 2F ie 


\H\\ 


1 2 


+ E[II^I|]. 


Proof. By Gaussian concentration m Proposition 4] and the fact that the spectral (operator) norm 
is 1-Lipschitz, we obtain that for all u > 0, 

H II > E [mill +«") < e _c “ 2 
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for some absolute positive constant c. Taking u = 5E \\H\\ , we obtain that 


H || > (1 + «5)E 


\H\ 


< e 


—4 c5 2 t 


Thus, 


E 


\H\ 


r*+oo 


'0 


F \\H\\ 2 > s ) ds 


rE \\H\\ 


= E 


< E 


\H\ 


\H\ 


1 2 


r 

J e[ 

r 

Je\ 


H || 2 > s ) ds + 

+oo 


/*+OO 

JeIwhW 


mim 


r»+oo 


E[||H| 

H\\ > a/s') ds 


H\\ 2 > s ) ds 


exp —4 c 




E[||F||] 


t ds 


and making the change of variable r = (y/s — E[||i/||]) 2 , we obtain 


E 


\H\ 


r-c 


H\r > s ) ds 


< E 


< E 


\H\\ 


\H\ 


1 2 


f +C ° exp (—4 --ri fl + T) 

Jo V EIIIHII ] 2 M VfJ 

r+o o 

2 / 

■J 0 


exp —4 


E[||F||] 2 
ct 


EOI^H ] 2 


r) dr + 


dr 

EIII^II] 2 1 


m\\ n \u i 

L Tr dT ■ 


Thus, we obtain 


E 


\H\\ 


< E 


\H\\ 


1 2 


+E[im- 


< ( 1 H—— ) E 
“ 2 ct 


\H\ 


i 2 


E[||^ll] 2 

2 ct 

+n\\H\\]. 


exp ( —4 


ct 


E[IW 


+oo 


J 0 


This completes the proof. 

6.3. Some properties of E, , JH, S and T. 


□ 


6.3.1. The spectrum of E. The spectrum of E can be studied using the methods of Grenander and 
Szego [S]. In [5], the classical results are extended to the case of generalized fractional processes, it 
was shown in particular by Grenander and Szego in [HJ Chapter 5] that 2nm < A < 2irM for any 
eigenvalue A of E, where m and M are the essential infimum and supremum of the spectral density 
function / of the process. For ARMA processes, this function is just 

2tt f(e lu ) 

where 

4>(z) = 1 — a\z — ■ ■ ■ — a p z p and 6(z) = 1 + b\z + • • • + h q z q . 

6.3.2. The spectrum ofT, H . Recall that H is the random matrix whose components are given by 

T—24+1 

H s,r = £s,s'Z s '+r- 

s '=0 

where e s s i, s = 0,..., t — 1 and s' = 0,..., T — 2t + 1 are independent Rademacher random variables 
which are independent of z s i, s' = 0,..., T — t. 
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Using matrix representation, we have 
H = £Z 


^ £o,i 

£0,2 

£0,T—2i+l ^ 


l Zo 

Zl 

■ Z t - X s 

£ i,i 

£1,2 • 

£l,T-2t+l 


Z 1 

Z2 

Zt 

^£*-1,1 

^t-1,2 ‘ 

• £*-l,T-2t+l / 


\ZT-2t+l 

ZT-2t+2 ■ 

• ZT-t) 


Let Z p be the (p + l)-th column of z. Then 

vec (H) = 


( eZ Q \ 
eZ\ 

\eZ t -\) 


The (p, q)-th block of E^ is given by 

Eg ig] =E [eBlZpZy], 

where for p < q 

E [ZpZW = (r \ ^ 

VT-2t+l-(q-p) 

Here, lT-' 2 t+i-(q-p) denotes the identity matrix of dimension T — 2t + 1 — (q — p). 
Partitioning e appropriately as 


(P 

0 / • 


e= i £ [l,l] £ [1,2] 

V £ [2,l] £[2,2] 


we deduce that 


yH 


= E 


£[i,i] £[1,2] j ( 0 

£[2,1] £[2,2]/ \lT-2t+l-{q-p) 0/ \£ 


0 \ / £ ri,l] £[2,1] 


[ 1 , 2 ] [ 2 , 2 ] 


E r -[l,2]£|i,i] £[1,2]£[2,1] 

" V £[2,2]£[1,1] £[2,2]£[2,1] 


= 0 


for p < q. Similarly, we can show that Ej'j) ^ = 0 for p > q. As for p = q, we have E[Z p Zp] = Ir- 2 t+\ ■ 
Thus 


Z?p,p ] = E[ee t } = (T-2t + l)I t 


It is then follows that T, H = (T — 2t + I)It(T-2t+i) ■ 

6.3.3. Consequences for Al, S and T■ Recall that Al denotes the operator defined by 


(6.28) 
and A 
(6.29) 


Al = Mat (E 


.h-V 2 


vec(-)) 


and Al * denotes the adjoint of the inverse of AT Using the resuts of Section [6.3.21 we obtain that 

1 


M = 


Id. 


s/T — 2t + 1 

and 

(6.30) M~* = VT-2t + l Id. 

Using these results, we obtain that S is the operator defined by 

1 


S(-) i-A 


and T is the mapping 


yJT - 2t + 1 
1 _1 


• E -1 / 2 . 
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We thus have the following results on T- 


mi < 


v° max (s) 

Vi 


and 


We also obtain that 


Omin (T) — 




V °"min(^) 

Vi ' 

V °"min ( S ) 
VT-2t + l 


6.4. Some properties of the x 2 distribution. We recall the following useful bounds for the x' 2 (u) 
distribution of degree of freedom v. 

Lemma 6.2. [4J Lemma B.l] The following bounds hold: 

P (xV) > VH + V%t) < exp(— t) 

(u e/2) 3 . 


P (x(u) < vav) < 


7TZ/ 
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